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Introduction 


In  this  research  our  basic  goal  has  been  to  produce  a  useful,  open-access,  large  database  of 
lesions  in  prostate  cancer  and  organize  them  in  terms  of  segments  of  aberrant  copy  numbers  for 
subsequent  automated  statistical  analysis.  As  a  primary  example  of  such  analysis,  we  aimed  to 
enable  this  database  to  be  easily  investigated  for  enumerating  those  regions,  which  harbor  genes 
likely  to  be  causally  related  to  the  disease.  We  hoped  to  maximize  the  utility  of  this  database,  by 
optimizing  various  factors  that  the  design  depends  upon:  cost,  availability,  efficiency,  quality 
control,  and  the  ease  with  which  it  could  be  studied,  navigated  and  visualized. 


Body 

Briefly,  we  have  made  significant  progress  in  all  of  these  directions: 

Key  research  Accomplishments  &  Reportable  Outcomes 

•  We  have  implemented  an  operational  software  for  oncogenomics  (arrayCGH  analysis). 
We  subsequently  created  a  significantly  improved  version  of  this  software  ready  for  the 
NYU  faculties  (Ostrer-  lab:  Mr.  Perlman,  Ms.  Salman)  and  a  number  of  other 
collaborators  through  an  online  service.  This  online  service  integrates  the  genome  view 
of  copy  number  data  with  major  sources  of  genome  annotation  such  as  NCBI’s  MapView, 
KEGG  and  AmiGO. 

•  We  have  used  this  opportunity  to  also  provide  computational  training  to  the  researchers 
in  Medical  School  in  using  the  VALIS  platform.  We  have  developed  a  simple  tool  for 
NYU-Medical  School  labs  to  browse  genomic  data  with  copy  number  variation  as  an 
instructional  exercise. 

•  We  have  received  internal  funding  from  NYU  Medical  School  for  one  year  and  plan  to 
release  the  software  through  their  portals. 

•  A  publication  (accessible  to  biologists  and  clinical  scientists)  entitled  “A  versatile 
statistical  analysis  algorithm  to  detect  copy  number  variation.”  and  describing  our  initial 
work  under  this  project  was  published  by  PNAS  and  has  attracted  major  collaborators 
who  are  providing  us  with  new  data.  Four  follow-up  publications  have3  been  or  are  being 
submitted:  (1)  A  better  analysis  algorithm  for  Affymetrix  chips  (10K,  100K  and  500K),  (2) 
Detection  of  Tumor  Suppressor  Genes  (with  exact  boundaries)  from  LOH  data,  (3)  A 
hyper-parametric  model  for  cancer  data  and  an  algorithm  based  on  this  model  that  can 
handle  noisier  technologies  without  biological  replicates,  and  (4)  A  comparative  analysis 
of  data  produced  by  Ostrer-lab. 

•  The  software  has  been  released  to  the  national  research  community  through  lab’s 
website  and  is  planned  to  be  released  under  “Bioconductor”  open-access  software 
library. 

•  Dr.  Daruwala  et  al. ,  with  support  from  this  grant,  have  created  “web-spidering”  software  to 
aggregate  the  dispersed  information  on  the  Internet  about  the  new  Affymetrix  CGH  chip. 
The  database  created  from  the  collected  information  helps  the  clinical  experimentalists  to 
better  interpret  the  data. 

•  A  computer  science  graduate  student,  Ms.  lonita,  joined  the  group  about  a  year  ago  and 
has  made  significant  progress  towards  her  PhD  thesis  research  on  LOH  analysis  to 
detect  tumor  suppressor  genes. 

•  We  have  designed  algorithms  to  understand  haplotype-copy-number-polymorphisms 
(HCNP)  in  a  population  and  in  cancer  patients.  We  have  designed  experiments  to 
perform  population  studies  and  a  detailed  plan  to  carry  out  data  collection. 
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Conclusions 

In  summary,  we  have  made  significant  and  visible  progress  towards  the  following  three  goals: 

1 .  We  have  developed  new  statistical  methods  to  improve  detection  of  abnormal  lesions, 
define  confidence  in  the  detected  lesions,  and  localize  putative  genes  involved  in  the 
cancer. 

2.  We  have  created  a  database  with  improved  statistical  significance.  It  has  an  enhanced 
human-computer  interface  in  order  that  the  users  (initially,  our  collaborating  teams  of 
scientists  and  clinicians)  can  effortlessly  maneuver  through  the  data  to  draw  conclusions. 

3.  Finally,  we  have  created  the  foundations  to  build  two  important  “bridges”  to  future  work. 
The  first  is  a  novel  statistical  algorithm  to  combine  the  genomic  data  with  whole  genome 
data  for  SNP  and  other  markers  (for  instance  indicating  LOH).  The  second  is  better  low- 
level  background  correction  software  that  makes  the  genomic  data  usable  without  too 
many  expensive  biological  replicates.  We  have  now  software  based  on  “redescription”  to 
create  the  capability  to  easily  combine  the  genomic  data  with  gene-expression  data. 
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5 
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•  "Taming  the  Complexity  of  Biochemical  Models  through  Bisimulation  and  Collapsing: 
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Theoretical  Computer  Science,  325(1):  45-67,  2004. 
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Academy  of  Science  U  S  A,  1 01  (46):  1 6292-7,  2004. 
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Chapters  in  books: 
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Professor  Pravin  Varaiya  on  his  65th  birthday),  (with  C.  Piazza),  Systems  &  Control: 
Foundations  &  Applications,  (Ed.  T.  Basar),  Birkhauser,  2005. 

•  "Simpathica:  A  Computational  Systems  Biology  Tool  within  the  Valis  Bioinformatics 
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(Ed.  E.  Eiles  and  A.  Kriete),  Elsevier,  2005. 

Refereed  journal  articles  accepted  for  publication 

•  "Validation  of  S.  pombe  sequence  assembly  by  micro-array  hybridization,"  (with  J.  West, 
W.  Casey,  and  M.  Wigler),  Journal  of  Computational  Biology,  2005. 

•  "Multiple  Biological  Model  Classification:  From  System  Biology  to  Synthetic  Biology," 

(with  M.  Antoniotti  et  al.),  Transactions  on  Computational  Systems  Biology,  2005. 

Research  Presentations 

•  LaserMED  Seminar,  Center  for  Catastrophe  Preparedness  and  Response,  NYU,  NYC, 
NY,  September  23,  2005.  “Large  Scale  Multi-Agent  Modeling  of  Catastrophes:  The 
Brazilian  Food-Poisoning  Scenario  &  Beyond” 

•  Bioinformatics  Seminar,  Cold  Spring  Harbor  Laboratory,  Long  Island,  NY,  September  21, 
2005.  “Ontology-Based  Analysis  of  Time-Course  Gene-Expression  Data.” 
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Experimental  Framework  to  Understand  Disease  Pathogenesis.” 
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Genome  Project:  Challenges  for  Bioinformatics.” 

•  Biotechnology  Seminar,  Indian  Institute  of  Technology,  New  Delhi,  India,  July  12,  2005. 
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